{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Combinatorial Optimization\n", "## Introduction\n", "Combinatorial optimization gathers different types of optimization problems where the decision maker needs to select an alternative from a finite set of possible alternatives with different outcomes, given a set of requirements (i.e. constraints). Some examples of problems in combinatorial optimization are:\n", "\n", "- **Grouping**: Find a subset of members within a set that optimizes an objective function (e.g. minimises costs or maximises profits) given a set of constraints. For instance, select the best starting 11 in a football team given their performance statistics.\n", "\n", "- **Ordering**: Find the optimal ordering of the members of a set that maximise or minimise an objective function with some constraints. For instance the optimal sequence of production orders that minimises production time.\n", "\n", "- **Assignment**: Assign members of a set to members of another set to optimise an objective function given a set of constraints. For instance, assign tasks to employees at a minimum cost\n", "\n", "Combinatorial optimization is strongly connected with set theory and the concept of **search**, or (iteratively) explore the set to find the optimal solution. Heuristics will explore the set of alternatives looking for the optimal solution. \n", "\n", "The following types of problems are outstanding instances of combinatorial optimization problems: \n", "\n", "- **Integer Programming**: Any Linear Programming Problems with Integer or binary variables. When this happens, the feasibility region and objective function are no longer convex, and we cannot apply the same principles and methods as with CLP. When we combine Integer, Binary and Continuous variables, we are dealing with *Mixed Integer Programming (MIP)*. MIP covers a wide range of OR problems, which we will be covering extensively in this unit.\n", "- **Travelling salesman problems**: In the classic formulation of the Travelling Salesman Problem (TSP), a salesman needs to visit a set of cities and wants to find the optimal *route* that would take him through all cities and back to his/her original departure city. Thus, given a list of cities and the distances between each pair of cities, the TSP aims to determine what is the shortest possible route that visits each city and returns to the origin city. The TSP is thus an ordering problem. There are many OR problems, specially in logistics and manufacturing that can be modeled as TSP. \n", "- **Knapsack problems**: In knapsack problems, given a set of items, each with a weight and a value, we need to determine the number of each item to include in a collection so that the total value is maximised while the total weight is less than or equal than the maximum weight the knapsack can hold. \n", "- **Minimum Spanning tree**: Minimum Spanning Trees (MST) problems have a wide number of applications in fields like logistics or telecommunications when we deal with *networks* that interconnect different objects or *edges* through *vertices*. In this context, MST are grouping problems where we want to find a subset of *edges* in a *graph* that interconnects certain *vertices* at a minimum cost. We will address MST problems, among others, in *Network Theory*.\n", "\n", "As you can already imagine by now, there is a really wide range of applications for combinatorial optimization. Some examples are **Planning and Scheduling** of activities, **product and process design**, portfolio **selection**, networking and routing (Internet traffic, airlines routes, logistics).\n", "\n", "## Types of techniques\n", "Given the importance of the field, scientist have developed many techniques to solve combinatorial optimization problems over the years. They can be classified into the following categories:\n", "\n", "- **Numeric Techniques**: Numeric techniques gather algorithms that provide an optimal solution by exploring all possible options. The simplest example is brute force, although there are many other algorithms that fall into this category. Whenever you use numeric functions like Python´s Numpy to compute a minimum, maximum, or ordering of a set, internally the library is applying a numeric technique to compute the result. As we will soon see, for problem instances where the set is large, it might not be computationally efficient to rely solely on numeric techniques, and that is why we have other methods. Numeric techniques are out of the scope of this unit.\n", "- **Problem-solving heuristics**: These techniques find a (sub-)optimal solution in a given number of steps. We will focus on some of these techniques in this unit. Some examples are:\n", " - **State space search**: These algorithms test successive combinations (or states) of the decision variables iteratively, in such a way that at every step the solution is closer to the optimal solution, until the optimum solution is found. One example of search heuristic is the Branch and Bound algorithm that will be covered in this unit.\n", " - **Greedy algorithms**: Greedy algorithms first sort the members of the set according to their impact in the objective function, and add them to the solution iteratively. \n", " - **Genetic algorithms**: Explore the possible solutions combining the features of an initial set of possible solutions, selecting the fittest candidates to combine their features in the next iteration.\n", "- **Graph Theory**: Graph theory is not really a type of technique, but rather, a separate field of mathematics dedicated to the study of relations between objects through mathematical structures known as graphs. Many combinatorial optimization problems can be easily modeled through these structures, and there are many efficient algorithms that effectively compute graph-like structures to find MSTs or **shortest paths** which provide the optimal solution to our problem.\n", "\n", "## Efficiency of Techniques\n", "But why do we need this plethora of techniques in the first place? The main reason is that it is not feasible to use numeric techniques in relatively large instances of problems, because the number of possible combinations **grows exponentially** with the number of members in the sets. The image below, taken from [Dynamic logic by Harel et al., 2000](https://ieeexplore.ieee.org/servlet/opac?bknumber=6267400) highlights this fact:\n", "\n", "![complexity](img/complexity.png)\n", "\n", "As shown in the graph, for relatively medium instances of combinatorial optimization problems like Integer Programming, the time required to explore all possible combinations might be larger than time itself! We need techniques that need fewer operations to find an optimal solution, or that at least can provide a suboptimal solution in a given interval of time.\n", "\n", "Not all techniques provide results in the same number of operations, that is, not all techniques are equally **efficient**. An informal definition of an efficient algorithm is an algorithm in which the time required to solve a problem instance does not increase too fast as the problem size increases. Below, there is a formal definition of efficiency:\n", "\n", "> An algorithm is efficient (good, polynomial ) if the time necessary to solve any instance of input size L is bounded above by a polynomial function of L. (Thus it does not increase as fast as an exponential function such as $2^L$.)\n", "\n", "The [Big O Notation](https://en.wikipedia.org/wiki/Big_O_notation) is very handy to compare the efficiency of algorithms. For instance, an algorithm that solves a problem of size $L$ in a time proportional to the square of $L$ is a polynomial algorithm in $\\mathcal{O}(n^2)$, where as an algorithm that solves the same problem in an exponential time proportional to $2^L$ is noted as $\\mathcal{O}(2^L)$. The latter is less efficient because the time required to solve the problem increases faster as $L$ becomes larger.\n", "An important consideration that needs to be made at this point is that efficiency does not explicitly mention the number of operations, but rather the computing time. This has different implications. The first is that algorithms are formulated in mathematical language, and we need a computing model to translate operations into computational time. \n", "In complexity theory, which is the field of mathematics dedicated to the analysis of the complexity of problems, the **Turing Machine**, which is a mathematical model of a “perfect” machine is used to define time as a function of the number of operations in the definitions above of efficiency above. \n", "\n", "Based on this definition of a computational machine model, problems can be classified according to their complexity, in a classification or taxonomy of problems.\n", "\n", "### Classification of problems in complexity theory\n", "Based on the definitions above, a problem is as complex as the most efficient algorithm devised to solve it. An algorithm is considered efficient if it can solve an instance of a problem of input size L in a polynomial function of L. But what about other problems? Are there different categories of complex problems?\n", "Complexity theory is a field of mathematics that focuses precisely on the analysis of the complexity of problems, and its findings help scientist guide the search for new algorithms and mathematical techniques. The following sections describe the different problem classes, which are categories defined for problems according to their complexity. \n", "\n", "#### Class Polynomial - Class P (Easy problems)\n", "\n", "A problem is in the class P (or polynomially solvable) if it admits a polynomial algorithm. This category gathers all \n", "problems for which we have found an algorithm which is able to find an efficient algorithm, able to solve it in a \n", "number of operations which is a polynomial function of the size of the problem. \n", "\n", "#### Class Non-deterministic Polynomial time - Class NP (Possibly easy problems)\n", "A problem is in the class non-deterministic polynomial (NP) if the problem can be verified in polynomial time, meaning \n", "that we can find a bound for the optimal solution in polynomial time. Think for instance of a maximization problem,\n", "if we can find an upper bound for the objective function in polynomial time, we might not know which is the optimal \n", "solution, but we can already use this information in our problem, as we know that the optimal value will not exceed the \n", "upper bound. \n", "\n", "From this, we can define the **associated decision problem** as a problem that allows us to verify if a given number is \n", "a valid bound for our problem. If the result of the associated decision problem is True (meaning, yes, the provided \n", "number is a valid bound), then we can use this information to provide an approximation to the optimal solution. \n", "\n", "Problems for which there is an algorithm that solves the associated decision problem in polynomial time are classified \n", "as NP problems. \n", "\n", "#### Class Non-deterministic Polynomial time Hard - Class NP-Hard (Possibly Hard Problems)\n", "##### Reduction\n", "To understand the NP-Hard category, we first need to introduce the concept of **reduction**. A reduction is basically \n", "an algorithm that allows us to convert an instance of a problem *A* into instances of another problem \n", "*B*. That is, we can use problem *B* to simulate and possibly fin\n", "##### Formal definition\n", "Formally, a problem *H* is NP-Hard if there is a **Polynomial** algorithm to reduce *every* problem *L* in class NP to it.\n", "\n", "Let us use the following figure to illustrate this:\n", "\n", "![NP reduction](./img/NP_H_reduction.png){width=50%}\n", "\n", "If a problem *L* can be reduced to *H* in polynomial time, and we can find a solution for *H* using an algorithm *B* in \n", "one unit of time, then we can solve *L* in polynomial time with an algorithm *A* which basically reduces *L* to *H* and \n", "then uses *B* to solve the problem. \n", "\n", "The Traveling Salesman Problem (TSP) is an example of an NP-Hard problem. \n", "\n", "##### Informal definition\n", "Based on the definition above, we can argue that NP-Hard problems are at least as hard as the hardest NP problem, since \n", "we can reduce every problem in NP to it in polynomial time, it follows that algorithms for NP-Hard problems are at least \n", "as efficient as algorithms for NP problems. \n", "\n", "#### Class Non-Deterministic Polynomial time class Complete - NP-Complete (Hard Problems)\n", "Formally, a problem is NP-Complete if: \n", "\n", "- A given solution van be verified in polynomial time (that is, it is an NP Problem). \n", "- It can be used to simulate any other problem for which we can verify that a solution in polynomial time (that is, it \n", "is an NP-Hard problem). \n", "\n", "It follows that, if a problem is both NP and NP-Hard, it can be used to validate the solution of the *complete* NP class\n", "\n", "Note however that although solutions for NP-Problems can be verified in polynomial time, this does not imply that an \n", "optimal solution can be found in polynomial time. That is, we can use **heuristics** to verify a bound and use this \n", "bound as an approximation of the optimal solution, and we can do this efficiently in polynomial time, but we cannot find \n", "an optimal solution in polynomial time, which would in turn imply that the problem is in class P.\n", "\n", "In fact, determining that every NP-complete problem is also a class P problem is an open question and one of the \n", "[Millennium Prize Problems](https://en.wikipedia.org/wiki/Millennium_Prize_Problems). Since, every NP-complete problem\n", "is *complete*, in the sense that it can be used to simulate any NP problem, if a NP-complete problem can be solved in \n", "polynomial time, then every NP problem can!\n", "\n", "![ Euler diagram](img/Euler_diagram.png){width=70%}\n", "\n", "(Source [wikipedia](https://en.wikipedia.org/wiki/NP-completeness#/media/File:P_np_np-complete_np-hard.svg) by [Behnam Esfahbod](href=\"//commons.wikimedia.org/wiki/User:Behnam) \n", " [BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0) [Link](https://commons.wikimedia.org/w/index.php?curid=3532181))\n", "\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" }, "pycharm": { "stem_cell": { "cell_type": "raw", "source": [], "metadata": { "collapsed": false } } } }, "nbformat": 4, "nbformat_minor": 0 }